AITopics | lexical simplification

Collaborating Authors

lexical simplification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs

Hayakawa, Akio, Bott, Stefan, Saggion, Horacio

arXiv.org Artificial IntelligenceSep-30-2025

Despite their strong performance, large language models (LLMs) face challenges in real-world application of lexical simplification (LS), particularly in privacy-sensitive and resource-constrained environments. Moreover, since vulnerable user groups (e.g., people with disabilities) are one of the key target groups of this technology, it is crucial to ensure the safety and correctness of the output of LS systems. To address these issues, we propose an efficient framework for LS systems that utilizes small LLMs deployable in local environments. Within this framework, we explore knowledge distillation with synthesized data and in-context learning as baselines. Our experiments in five languages evaluate model outputs both automatically and manually. Our manual analysis reveals that while knowledge distillation boosts automatic metric scores, it also introduces a safety trade-off by increasing harmful simplifications. Importantly, we find that the model's output probability is a useful signal for detecting harmful simplifications. Leveraging this, we propose a filtering strategy that suppresses harmful simplifications while largely preserving beneficial ones. This work establishes a benchmark for efficient and safe LS with small LLMs. It highlights the key trade-offs between performance, efficiency, and safety, and demonstrates a promising approach for safe real-world deployment.

large language model, machine learning, simplification, (17 more...)

arXiv.org Artificial Intelligence

2509.25086

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Redefining Simplicity: Benchmarking Large Language Models from Lexical to Document Simplification

Qiang, Jipeng, Huang, Minjiang, Zhu, Yi, Yuan, Yunhao, Zhang, Chaowei, Yu, Kui

arXiv.org Artificial IntelligenceFeb-12-2025

Text simplification (TS) refers to the process of reducing the complexity of a text while retaining its original meaning and key information. Existing work only shows that large language models (LLMs) have outperformed supervised non-LLM-based methods on sentence simplification. This study offers the first comprehensive analysis of LLM performance across four TS tasks: lexical, syntactic, sentence, and document simplification. We compare lightweight, closed-source and open-source LLMs against traditional non-LLM methods using automatic metrics and human evaluations. Our experiments reveal that LLMs not only outperform non-LLM approaches in all four tasks but also often generate outputs that exceed the quality of existing human-annotated references. Finally, we present some future directions of TS in the era of LLMs.

large language model, machine learning, simplification, (16 more...)

arXiv.org Artificial Intelligence

2502.08281

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets

Barbu, Eduard, Muru, Meeri-Ly, Malva, Sten Marcus

arXiv.org Artificial IntelligenceJan-26-2025

This study introduces an approach to Estonian text simplification using two model architectures: a neural machine translation model and a fine-tuned large language model (LLaMA). Given the limited resources for Estonian, we developed a new dataset, the Estonian Simplification Dataset, combining translated data and GPT-4.0-generated simplifications. We benchmarked OpenNMT, a neural machine translation model that frames text simplification as a translation task, and fine-tuned the LLaMA model on our dataset to tailor it specifically for Estonian simplification. Manual evaluations on the test set show that the LLaMA model consistently outperforms OpenNMT in readability, grammaticality, and meaning preservation. These findings underscore the potential of large language models for low-resource languages and provide a basis for further research in Estonian text simplification.

large language model, machine learning, simplification, (16 more...)

arXiv.org Artificial Intelligence

2501.15624

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Estonia > Tartu County > Tartu (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(10 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.88)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

New Evaluation Paradigm for Lexical Simplification

Qiang, Jipeng, Huang, Minjiang, Zhu, Yi, Yuan, Yunhao, Zhang, Chaowei, Ouyang, Xiaoye

arXiv.org Artificial IntelligenceJan-25-2025

Lexical Simplification (LS) methods use a three-step pipeline: complex word identification, substitute generation, and substitute ranking, each with separate evaluation datasets. We found large language models (LLMs) can simplify sentences directly with a single prompt, bypassing the traditional pipeline. However, existing LS datasets are not suitable for evaluating these LLM-generated simplified sentences, as they focus on providing substitutes for single complex words without identifying all complex words in a sentence. To address this gap, we propose a new annotation method for constructing an all-in-one LS dataset through human-machine collaboration. Automated methods generate a pool of potential substitutes, which human annotators then assess, suggesting additional alternatives as needed. Additionally, we explore LLM-based methods with single prompts, in-context learning, and chain-of-thought techniques. We introduce a multi-LLMs collaboration approach to simulate each step of the LS task. Experimental results demonstrate that LS based on multi-LLMs approaches significantly outperforms existing baselines.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.15268

Country:

North America > Guatemala (0.04)
Asia > China > Jiangsu Province (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Large Language Models to generate Easy to Read content

Martínez, Paloma, Moreno, Lourdes, Ramos, Alberto

arXiv.org Artificial IntelligenceJul-29-2024

Ensuring text accessibility and understandability are essential goals, particularly for individuals with cognitive impairments and intellectual disabilities, who encounter challenges in accessing information across various mediums such as web pages, newspapers, administrative tasks, or health documents. Initiatives like Easy to Read and Plain Language guidelines aim to simplify complex texts; however, standardizing these guidelines remains challenging and often involves manual processes. This work presents an exploratory investigation into leveraging Artificial Intelligence (AI) and Natural Language Processing (NLP) approaches to systematically simplify Spanish texts into Easy to Read formats, with a focus on utilizing Large Language Models (LLMs) for simplifying texts, especially in generating Easy to Read content. The study contributes a parallel corpus of Spanish adapted for Easy To Read format, which serves as a valuable resource for training and testing text simplification systems. Additionally, several text simplification experiments using LLMs and the collected corpus are conducted, involving fine-tuning and testing a Llama2 model to generate Easy to Read content. A qualitative evaluation, guided by an expert in text adaptation for Easy to Read content, is carried out to assess the automatically simplified texts. This research contributes to advancing text accessibility for individuals with cognitive impairments, highlighting promising strategies for leveraging LLMs while responsibly managing energy usage.

generate easy, guideline, simplification, (16 more...)

arXiv.org Artificial Intelligence

2407.20046

Country:

Europe > Spain > Galicia > Madrid (0.05)
South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
Europe > Portugal (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

The SAMER Arabic Text Simplification Corpus

Alhafni, Bashar, Hazim, Reem, Liberato, Juan Piñeros, Khalil, Muhamed Al, Habash, Nizar

arXiv.org Artificial IntelligenceApr-29-2024

We present the SAMER Corpus, the first manually annotated Arabic parallel corpus for text simplification targeting school-aged learners. Our corpus comprises texts of 159K words selected from 15 publicly available Arabic fiction novels most of which were published between 1865 and 1955. Our corpus includes readability level annotations at both the document and word levels, as well as two simplified parallel versions for each text targeting learners at two different readability levels. We describe the corpus selection process, and outline the guidelines we followed to create the annotations and ensure their quality. Our corpus is publicly available to support and encourage research on Arabic text simplification, Arabic automatic readability assessment, and the development of Arabic pedagogical language technologies.

computational linguistic, proceedings, simplification, (15 more...)

arXiv.org Artificial Intelligence

2404.18615

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
(31 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish

Bott, Stefan, Saggion, Horacio, Rojas, Nelson Peréz, Salazar, Martin Solis, Ramirez, Saul Calderon

arXiv.org Artificial IntelligenceApr-11-2024

Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification which is available for Spanish. Specifically, MultiLS-SP is the first dataset for Spanish which includes scalar ratings of the understanding difficulty of lexical items. In addition, we describe experiments with this dataset, which can serve as a baseline for future work on the same data.

dataset, saggion, simplification, (13 more...)

arXiv.org Artificial Intelligence

2404.07814

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Bulgaria > Varna Province > Varna (0.04)
North America > United States > Maryland (0.04)
(9 more...)

Genre: Research Report (0.40)

Industry:

Government (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An LLM-Enhanced Adversarial Editing System for Lexical Simplification

Tan, Keren, Luo, Kangyang, Lan, Yunshi, Yuan, Zheng, Shu, Jinlong

arXiv.org Artificial IntelligenceMar-22-2024

Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method.

complex word, edit predictor, simplification, (14 more...)

arXiv.org Artificial Intelligence

2402.14704

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

MultiLS: A Multi-task Lexical Simplification Framework

North, Kai, Ranasinghe, Tharindu, Shardlow, Matthew, Zampieri, Marcos

arXiv.org Artificial IntelligenceFeb-22-2024

Lexical Simplification (LS) automatically replaces difficult to read words for easier alternatives while preserving a sentence's original meaning. LS is a precursor to Text Simplification with the aim of improving text accessibility to various target demographics, including children, second language learners, individuals with reading disabilities or low literacy. Several datasets exist for LS. These LS datasets specialize on one or two sub-tasks within the LS pipeline. However, as of this moment, no single LS dataset has been developed that covers all LS sub-tasks. We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset. We also present MultiLS-PT, the first dataset to be created using the MultiLS framework. We demonstrate the potential of MultiLS-PT by carrying out all LS sub-tasks of (1). lexical complexity prediction (LCP), (2). substitute generation, and (3). substitute ranking for Portuguese. Model performances are reported, ranging from transformer-based models to more recent large language models (LLMs).

candidate substitution, dataset, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2402.14972

Country:

South America > Brazil (0.05)
South America > Ecuador > Guayas Province > Guayaquil (0.04)
North America > United States > Texas (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Automatic Generation and Simplification of Children's Stories

Valentini, Maria, Weber, Jennifer, Salcido, Jesus, Wright, Téa, Colunga, Eliana, Kann, Katharina

arXiv.org Artificial IntelligenceOct-27-2023

With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic. Working toward the goal of age-appropriate simplicity in generated educational texts, we first examine the ability of several popular LLMs to generate stories with properly adjusted lexical and readability levels. We find that, in spite of the growing capabilities of LLMs, they do not yet possess the ability to limit their vocabulary to levels appropriate for younger age groups. As a second experiment, we explore the ability of state-of-the-art lexical simplification models to generalize to the domain of children's stories and, thus, create an efficient pipeline for their automatic generation. In order to test these models, we develop a dataset of child-directed lexical simplification instances, with examples taken from the LLM-generated stories in our first experiment. We find that, while the strongest-performing current lexical simplification models do not perform as well on material designed for children due to their reliance on large language models behind the scenes, some models that still achieve fairly strong results on general data can mimic or even improve their performance on children-directed data with proper fine-tuning, which we conduct using our newly created child-directed simplification dataset.

dataset, lexical simplification, simplification, (15 more...)

arXiv.org Artificial Intelligence

2310.18502

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback